Practical DataOps by Harvinder Atwal
Author:Harvinder Atwal
Language: eng
Format: epub
ISBN: 9781484251041
Publisher: Apress
Concept Drift
Machine learning and deep learning models learn rules from historical input and output data to make predictions from new input data. The relationship between new input and output data is assumed to remain the same as historical input and output data, so the machine learning model is expected to make useful predictions for new unseen data. The relationship may hold in some cases, for instance, those fixed by laws of nature such as image recognition algorithms for cats. However, other relationships like customer purchasing behavior, spam email detection, or product quality with machinery wear will eventually evolve in a process known as concept drift.
A passive strategy to solve the problem of concept drift is to retrain models using a window of recent data periodically. However, this is not an option in some circumstances due to negative feedback loops. Imagine a recommender system where specific customers see recommendations for product X based on previous purchasing relationships. The data for these customers is now biased because they are now even more likely to buy product X. A model trained on this data will boost the recommendation for product X further, even if in the outside world the original relationship has changed and customers prefer product Y to product X.
One solution to the negative feedback problem is to randomly hold out some instances from model predictions to create a baseline to measure model performance and generate an unbiased dataset for model training. Another strategy is to use a trigger to initiate a model update. Concept drift can be challenging to discover, and there are multiple algorithms for detection. Some of the commonly used algorithms are Drift detection method (DDM), Early drift detection method (EDDM), Geometric moving average detection method (GMADM), and Exponentially weighted moving average chart detection method. These algorithms can be built into continuous diagnostic monitoring of performance and alert when a model requires retraining. Predictions can then be turned off for some or all instances to generate unbiased data for retraining to avoid negative feedback loops. The retraining only happens when the model is not performing well so the cost of training and opportunity cost of prediction benefit is lower than the other two methods.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Disaster & Recovery | Email Administration |
Linux & UNIX Administration | Storage & Retrieval |
Windows Administration |
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(7792)
Grails in Action by Glen Smith Peter Ledbrook(7705)
Configuring Windows Server Hybrid Advanced Services Exam Ref AZ-801 by Chris Gill(6658)
Azure Containers Explained by Wesley Haakman & Richard Hooper(6651)
Running Windows Containers on AWS by Marcio Morales(6172)
Kotlin in Action by Dmitry Jemerov(5074)
Microsoft 365 Identity and Services Exam Guide MS-100 by Aaron Guilmette(4964)
Combating Crime on the Dark Web by Nearchos Nearchou(4554)
Microsoft Cybersecurity Architect Exam Ref SC-100 by Dwayne Natwick(4433)
Management Strategies for the Cloud Revolution: How Cloud Computing Is Transforming Business and Why You Can't Afford to Be Left Behind by Charles Babcock(4425)
The Ruby Workshop by Akshat Paul Peter Philips Dániel Szabó and Cheyne Wallace(4217)
The Age of Surveillance Capitalism by Shoshana Zuboff(3964)
Python for Security and Networking - Third Edition by José Manuel Ortega(3790)
Learn Windows PowerShell in a Month of Lunches by Don Jones(3515)
The Ultimate Docker Container Book by Schenker Gabriel N.;(3457)
Mastering Python for Networking and Security by José Manuel Ortega(3359)
Learn Wireshark by Lisa Bock(3354)
Mastering Azure Security by Mustafa Toroman and Tom Janetscheck(3337)
Blockchain Basics by Daniel Drescher(3308)
